Mixture of Experts (MoE)

多数の小型特化モデル（専門家，Expert）を混合したようなアーキテクチャ

ニューラルネットワーク内に独立した部分ネットワーク（Expert）をもつ。

入力に応じて、ルーター（Gating Network）が適したネットワークを選定し、それらに処理を任せる。

その他の部分ネットワークは動作しないため、ネットワーク全体を動かす構造（Denseモデル）よりも速い。

ただし、メモリには非稼働のExpertを含めて保持しなければならないため、要求VRAMは変わらない。

code:mermaid

graph TD

Router -- 上位スコアのみ選択 --> E1Expert 1

Router -- 上位スコアのみ選択 --> E3Expert 3

Router -. 非選択 .-> E2Expert 2

Router -. 非選択 .-> E4Expert 4

E1 -- 出力 --> Output出力

E3 -- 出力 --> Output

classDef active fill:#e1f5fe,stroke:#01579b,stroke-width:2px;

classDef inactive fill:#f5f5f5,stroke:#9e9e9e,stroke-width:1px,color:#9e9e9e;

class E1,E3,Output active;

class E2,E4 inactive;